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(g) Speech recognition system with display for user's confirmation. 

(57) A speech recognizer system for use with a 
telecommunication network wherein an input 
signal generated onto the network from a first 
terminal is directed to a speech recognizer for 
estimating the verbal content of the input sig- 
nal. The speech recognizer or associated equip- 
ment then directs an estimate of the verbal 
content as an output signal back to the first 
terminal, the estimate including one or more 
approximations of the verbal content of the 
input signal. At the first terminal the user then 
confirms a correct estimate, or selects from a 
plurality of approximations, the-verbal content 
of the input signal. 
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selection of an approximation. 

Preferably, the first terminal 2 has a visual display 
20 and the output signal 8 is in a digital format when 
sent to the first terminal 2 to speed the movement of 
the output signal 8 on the network to the first terminal 
2. Preferably the terminal will have a modem to de- 
code the digital signal for presentation. Alternatively, 
the output signal 8 will be in DTMF, which can be used 
with most all current terminal system, that is decoded 
and presented in visual form at the terminal. The dis- 
play 20 of the terminal 2 will then decode the signal 
8 and visually present it to the user. 

At the first terminal 2 the output signal 8 is re- 
ceived and at least a portion thereof, i.e. one approx- 
imation, is presented to the user for confirmation. In 
its preferred embodiment the most probable approx- 
imation will be displayed first, followed by the next 
most probable if the first is not selected or confirmed. 

Of course, although a visual display 20 is prefer- 
red, the presentation can be audio via a speaker 28, 
visual or both, with visual alone or in combination with 
audio being preferred in most instances due to the 
speed of presentation and reduced need for user 
alertness where the visual information can stay on a 
display 20 until the user wishes to remove it, by con- 
firming the correctness or selecting the next approx- 
imation. 

However, for car phones where the user has his 
eyes on the road or with telephones that do not have 
a visual display, audio presentation via speaker 28 is 
available alone or along with the visual display 20. 

In situations where an audio presentation is 
made, "barge-in" capability is especially important so 
a user does not need to wait for the end of the audio 
presentation to confirm a correct approximation or re- 
quest the next selection. The barge-in feature allows 
the user to make a confirmation or request another 
approximation, by depressing a key on the keypad 24 
orspeaking into the microphone 22 during the presen- 
tation, thereby terminating the presentation of the 
previous approximation without having to listen to the 
entire presentation. 

The preferred visual display 20 can be of any 
type, including a Caller I.D. where a line or more of al- 
phanumeric text is presented in an LCD display, a 
P.C. monitor, a CRT display, a vacuum fluorescent 
display, an LED display, a video telephone, a still im- 
age telephone, etc. 

In implementing the speech recognition system 
of the present invention, a communication protocol 
must be defined for transmitting the output signal 8 
from the network 1 to the first terminal 2. Definition 
of the protocol requires that the variety of possible ter- 
minal types and visual displays present in the net- 
work be taken into account. Several methods are cur- 
rently envisioned herein, including a bidirectional pro- 
tocol, a terminal specific protocol and a unidirectional 
protocol. 



A bidirectional protocol requires that the terminal 
2 respond to the network prompt and describe the ca- 
pabilities of the terminal 2. The network can then di- 
rect an output signal 8 to the terminal 2 which matches 
5 the capabilities of the terminal 2. For example, if the 
line interface 26 of the first terminal 2 has a high 
speed modem, the output 8 will be set faster using the 
modem protocol. If the terminal 2 is a videophone or 
still image telephone, the system will generate an out- 
10 put signal 8 comprising a video image for transmis- 
sion to the visual display 20 terminal 2. If the terminal 
2 can display more than one approximation, the esti- 
mate may be transmitted by the network for visual 
display of more than one approximation and prompt 
15 the user by synthesized speech, etc., to choose. In 
the bidirectional protocol a terminal which does not 
respond to the prompt will be considered to not have 
any visual display 20 and the output 8 will be in the 
form of synthesized speech. 
20 With a terminal specific protocol, the network 1 

stores a table of the identities of each terminal 2 and 
utilizes a terminal specific protocol based on the in- 
formation on the specific terminal. This approach, 
however, would only be effective in a small network 
25 where a network administrator has control over all of 
the installed terminals. 

With a unidirectional protocol the network trans- 
mits both a digital feedback for visual display and an 
audio synthesized speech feedback for audio presen- 
30 tation.to the.user.at.the first.terminal.2. The format is 
fixed and the specific terminal can ignore or display 
the digital feedback for visual display. Of course, this 
is the most simple protocol, however, it does not allow 
for customization to specific terminals. 
35 When the presentation is made to the user at the 

first terminal 2, the user is able to confirm a correct 
estimate. This includes the ability to indicate that a 
correct estimate is displayed or request another alter- 
native if additional alternatives are available. If a rnul- 
40 ti-line display, e.g. a CRT display is used, the confir- 
mation means includes selection means to select 
from the approximations displayed, to scroll down or 
bring up a new screen of additional approximations, 
etc. Such means includes a keypad 24 ora micro- 
45 phone 22 for voice input. 

Additionally, the feedback can be augmented 
with other information resulting from the query of a 
database with a recognized input or the most closely 
matching approximations of an input. For example, in 
so a telephone directory application response to an input 
signal may include an estimate including the most 
closely matching name or names together with the 
corresponding telephone numbers. Similarly, in an 
exchange request the cost of calling each of the ap- 
55 proximations can be included. In such applications 
the confirmation feature can include automatic dial- 
ing of the selected approximation or a request for the 
next screenful of approximations. 
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most probable alternative presented after said 
most probable approximation only if the most 
probable approximation has not been confirmed. 

12. A method of error reduction in speech recognition 5 
in a telecommunications network comprising the 
ste ps of providing an estimate of the content of an 
input signal placed on the network from a user at 
first terminal means back to the first terminal 
means, said estimate comprising more than one 10 
approximation of the content of the input signal, 
and providing confirmation or selection of a cor- 
rect approximation from the first terminal means 
onto the network. 
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(54) Speech recognition system with display for user's confirmation 



(57) A speech recognizer system for use with a tel- 
ecommunication network wherein an input signal gen- 
erated onto the network from a first terminal is directed 
to a speech recognizer for estimating the verbal content 
of the input signal. The speech recognizer or associated 
equipment then directs an estimate of the verbal content 
as an output signal back to the first terminal, the esti- 
mate including one or more approximations of the verbal 
content of the input signal. At the first terminal the user 
then confirms a correct estimate, or selects from a plu- 
rality of approximations, the verbal content of the input 
signal. 
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